dialogue scenario
Model Fusion with Multi-LoRA Inference for Tool-Enhanced Game Dialogue Agents
Wang, Kangxu, Chen, Ze, Wei, Chengcheng, Zheng, Jiewen, He, Jiarong, Gao, Max
This paper presents the opdainlp team's solution for the GPU track of the CPDC 2025 challenge. The challenge consists of three tasks, aiming to build an in-game conversational AI that adheres to character personas, aligns with the game's worldview, and supports function calling. Considering both effectiveness and resource/time constraints during inference, we synthesized data for some of the tasks based on the datasets provided by the competition organizers. We employed Qwen3-14B with LoRA fine-tuning and model fusion, and utilized a base model integrated with multiple LoRA adapters during inference. Specifically, in the competition, we used three distinct LoRA adapters to handle tool calling, response generation with tool call results, and response generation without tool call results, respectively. MultiLoRA inference was implemented using vLLM. Our solution achieved the first place in Task 1 and Task 3, and the second place in Task 2 of the GPU track.
- Europe > Austria > Vienna (0.14)
- North America > United States > Kansas > Cowley County (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
Ji, Shengpeng, Liang, Tianle, Li, Yangzhuo, Zuo, Jialong, Fang, Minghui, He, Jinzheng, Chen, Yifu, Liu, Zhengqing, Jiang, Ziyue, Cheng, Xize, Zheng, Siqi, Xu, Jin, Lin, Junyang, Zhao, Zhou
End-to-end spoken dialogue models such as GPT-4o-audio have recently garnered significant attention in the speech domain. However, the evaluation of spoken dialogue models' conversational performance has largely been overlooked. This is primarily due to the intelligent chatbots convey a wealth of non-textual information which cannot be easily measured using text-based language models like ChatGPT. To address this gap, we propose WavReward, a reward feedback model based on audio language models that can evaluate both the IQ and EQ of spoken dialogue systems with speech input. Specifically, 1) based on audio language models, WavReward incorporates the deep reasoning process and the nonlinear reward mechanism for post-training. By utilizing multi-sample feedback via the reinforcement learning algorithm, we construct a specialized evaluator tailored to spoken dialogue models. 2) We introduce ChatReward-30K, a preference dataset used to train WavReward. ChatReward-30K includes both comprehension and generation aspects of spoken dialogue models. These scenarios span various tasks, such as text-based chats, nine acoustic attributes of instruction chats, and implicit chats. WavReward outperforms previous state-of-the-art evaluation models across multiple spoken dialogue scenarios, achieving a substantial improvement about Qwen2.5-Omni in objective accuracy from 53.4$\%$ to 91.5$\%$. In subjective A/B testing, WavReward also leads by a margin of 83$\%$. Comprehensive ablation studies confirm the necessity of each component of WavReward. All data and code will be publicly at https://github.com/jishengpeng/WavReward after the paper is accepted.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Can xLLMs Understand the Structure of Dialog? Exploring Multilingual Response Generation in Complex Scenarios
Hu, Zhongtian, Cui, Yiwen, Li, Ronghan, Zhao, Meng, Wang, Lifang
Multilingual research has garnered increasing attention, especially in the domain of dialogue systems. The rapid advancements in large language models (LLMs) have fueled the demand for high-performing multilingual models. However, two major challenges persist: the scarcity of high-quality multilingual datasets and the limited complexity of existing datasets in capturing realistic dialogue scenarios. To address these gaps, we introduce XMP, a high-quality parallel Multilingual dataset sourced from Multi-party Podcast dialogues. Each sample in the dataset features at least three participants discussing a wide range of topics, including society, culture, politics, and entertainment.Through extensive experiments, we uncover significant limitations in previously recognized multilingual capabilities of LLMs when applied to such complex dialogue scenarios. For instance, the widely accepted multilingual complementary ability of LLMs is notably impacted. By conducting further experiments, we explore the mechanisms of LLMs in multilingual environments from multiple perspectives, shedding new light on their performance in real-world, diverse conversational contexts.
Inductive-Deductive Strategy Reuse for Multi-Turn Instructional Dialogues
Ou, Jiao, Wu, Jiayu, Liu, Che, Zhang, Fuzheng, Zhang, Di, Gai, Kun
Aligning large language models (LLMs) with human expectations requires high-quality instructional dialogues, which can be achieved by raising diverse, in-depth, and insightful instructions that deepen interactions. Existing methods target instructions from real instruction dialogues as a learning goal and fine-tune a user simulator for posing instructions. However, the user simulator struggles to implicitly model complex dialogue flows and pose high-quality instructions. In this paper, we take inspiration from the cognitive abilities inherent in human learning and propose the explicit modeling of complex dialogue flows through instructional strategy reuse. Specifically, we first induce high-level strategies from various real instruction dialogues. These strategies are applied to new dialogue scenarios deductively, where the instructional strategies facilitate high-quality instructions. Experimental results show that our method can generate diverse, in-depth, and insightful instructions for a given dialogue history. The constructed multi-turn instructional dialogues can outperform competitive baselines on the downstream chat model.
- Oceania > Australia > Victoria > Melbourne (0.14)
- Oceania > Australia > New South Wales > Sydney (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- (18 more...)
- Research Report > New Finding (0.48)
- Research Report > Experimental Study (0.46)
- Transportation > Infrastructure & Services > Airport (1.00)
- Transportation > Air (1.00)
- Media (1.00)
- (5 more...)
Enhancing Consistency in Multimodal Dialogue System Using LLM with Dialogue Scenario
Onozeki, Hiroki, Qi, Zhiyang, Akiyama, Kazuma, Asahara, Ryutaro, Kaneko, Takumasa, Inaba, Michimasa
This paper describes our dialogue system submitted to Dialogue Robot Competition 2023. The system's task is to help a user at a travel agency decide on a plan for visiting two sightseeing spots in Kyoto City that satisfy the user. Our dialogue system is flexible and stable and responds to user requirements by controlling dialogue flow according to dialogue scenarios. We also improved user satisfaction by introducing motion and speech control based on system utterances and user situations. In the preliminary round, our system was ranked fifth in the impression evaluation and sixth in the plan evaluation among all 12 teams.
Enhancing Performance on Seen and Unseen Dialogue Scenarios using Retrieval-Augmented End-to-End Task-Oriented System
Zhang, Jianguo, Roller, Stephen, Qian, Kun, Liu, Zhiwei, Meng, Rui, Heinecke, Shelby, Wang, Huan, Savarese, Silvio, Xiong, Caiming
End-to-end task-oriented dialogue (TOD) systems have achieved promising performance by leveraging sophisticated natural language understanding and natural language generation capabilities of pre-trained models. This work enables the TOD systems with more flexibility through a simple cache. The cache provides the flexibility to dynamically update the TOD systems and handle both existing and unseen dialogue scenarios. Towards this end, we first fine-tune a retrieval module to effectively retrieve the most relevant information entries from the cache. We then train end-to-end TOD models that can refer to and ground on both dialogue history and retrieved information during TOD generation. The cache is straightforward to construct, and the backbone models of TOD systems are compatible with existing pre-trained generative models. Extensive experiments demonstrate the superior performance of our framework, with a notable improvement in non-empty joint goal accuracy by 6.7% compared to strong baselines.
Fujitsu develops task-oriented dialogue technology with AI
Fujitsu Laboratories today announced the development of technology that can be easily set up and autonomously carry on a dialogue, based on AI technology, while accurately understanding a user's request and naturally eliciting the necessary information. The technology is intended primarily for customer service support. With previous technology, dialogue with computers required preparations of dialogue scenarios laying out how to respond when certain things are said, and business systems usually operated based on these scenarios. Now Fujitsu Laboratories has developed a new technology that can structurally extract the relationships between word meanings of input text to deal with the multiple meanings, ambiguity and other problems particular to Japanese language expressions, enabling a highly accurate understanding of users' speech and realizing smooth dialogue. In addition, by properly incorporating information from external databases, such as linked open data (LOD), while also using a knowledge-based dialogue creation technology that automatically learns response options for natural dialogue from records, Fujitsu Laboratories has developed technology that can autonomously conduct dialogue.
- North America > United States > Hawaii (0.06)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.06)
- Asia > Japan > Honshū > Tōhoku (0.05)